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The primary results of most observations of cosmic microwave background (CMB) anisotropy are 
estimates of the angular power spectrum averaged through some broad band, called band-powers. 
These estimates are in turn what are used to produce constraints on cosmological parameters due 
to all CMB observations. Essential to this estimation of cosmological parameters is the calculation 
of the expected band-power for a given experiment, given a theoretical power spectrum. Here we 
derive the "band-power" window function which should be used for this calculation, and point out 
that it is not equivalent to the window function used to calculate the variance. This important 
distinction has been absent from much of the literature: the variance window function is often used 
as the band-power window function. We discuss the validity of this assumed equivalence, the role 
of window functions for experiments that constrain the power in multiple bands, and summarize a 
prescription for reporting experimental results. The analysis methods detailed here are applied in a 
companion paper to three years of data from the Medium Scale Anisotropy Measurement. 
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I. INTRODUCTION 

Measurement of the anisotropy of the Cosmic Mi- 
crowave Background (CMB) is proving to be a power- 
ful cosmological probe. However, exact statistical treat- 
ment of the data is complicated and time-consuming, and 
promises to become prohibitively so in the very near fu- 
ture. This difhculty explains why no one has calculated 
the likelihood of cosmological parameters, given the avail- 
able data from all CMB experiments. 

Instead, constraints on cosmological parameters have 
been derived by approximate methods — namely, the use 
of "radical compression" ||l|. Reduction of CMB data 
to estimates of the angular power spectrum, Ci, can be 
viewed as a form of data compression. This compressed 
data is then in turn used to constrain cosmological pa- 
rameters. Usually, the compression is not to estimates 
of the individual CiS themselves, but to band-powers H] 
which are averages of the power spectrum through a cer- 
tain filter, or window function. 

These band-powers, together with their window func- 
tions, have traditionally been the main results of CMB 
experiments. Unfortunately, a large number of exper- 
imental results papers only provide what we will call 
the variance window function, and not the "band-power" 
window function. Indeed, the distinction between the 
two has been missing in much of the literature. They 
are not equivalent, except in the limit of vanishing off- 
diagonal signal correlations. Reports of constraints on 
the CMB power spectrum should contain the latter type 
of window function, together with the quantification of 
the uncertainty in the band-power. 

Not all reductions of CMB data to power spectrum es- 
timates have been presented with only variance window 
functions. As a rough guide, those that have been ana- 
lyzed with "quadratic estimators" have the right (band- 
power) window function while those that have been ana- 
lyzed by evaluation of the likelihood function do not. The 



persistence of this confusion is probably due to the fact 
that likelihood analysis obscures the relation between the 
data and the derived band-power, which is much clearer 
when one uses an estimator. It has been shown |3| that a 
particular quadratic estimator [^|-£| is guaranteed to pro- 
duce the maximum-likelihood result (if used iteratively) 
and below we exploit this fact to derive the expression 
for the band-power window function appropriate for like- 
lihood analysis. 

In a companion paper p] we apply the radical compres- 
sion procedures detailed here to three years of data from 
the Medium Scale Anisotropy Measurement (MSAM). 
This procedure is a combination of techniques developed 
ill Hi IM ^^"^ here. The application to MSAM strongly 
demonstrates the power of this method compared to the 
usual approach of compression to flat band-powers. In 
particular, the analysis in the companion paper results 
in a new and significant constraint at I '^ 400, a theo- 
retically very interesting region of the power spectrum, 
which had been previously obscured by use of variance 
window functions, rather than band-power window func- 
tions. 



II. THE BAND-POWER METHOD 

A very useful meeting point for theory and experiment 
is provided by the band-power. An important property 
of a meeting point, is that both parties planning on meet- 
ing should be able to get there. Although the directions 
for going from the data to the band-power are already 
clearly explained in the literature (and will be reviewed 
below), those for going from the theory to the band-power 
are not. The point of this paper is to provide those di- 
rections — directions which are clearly essential to the 
confrontation. 

We now explicitly define the band-power method 0] 
which has been used by many authors, e.g., ^MM- In the 



simplest case of a dataset that produces one band-power, 
its calculation is conceptually straightforward: the power 
spectrum is assumed to be flat {Ci independent of Z, where 
Ci = 2^C'z) and the band-power estimate is taken to 
be the amplitude of Ci estimated from the data {e.g., 
via likelihood analysis). For dataset B, let us call this 
maximum likelihood value, Cb- Let us further assume 
that the uncertainty in Cb is Gaussian-distributed with 
variance a^. 

Because theoretical power spectra are not flat, the rela- 
tion between a theoretical power spectrum and the pre- 
diction of that theory for the Cb derived from data is 
non-trivial. The theoretical prediction is simply the ex- 
pectation value of Cb, given that the theory is true. Since 
Cb is a determination of the amplitude of the power spec- 
trum we will assume a linear dependence of its expecta- 
tion value on the power spectrum of the theory, specified 
by the band-power window function, Wi^: 



(Cb)=E«/OCK«p) 



(1) 



where Op are the parameters of the theory. These pa- 
rameters. Op, could be cosmological parameters {e.g., fliy, 
r^A, Hq, etc.) or parameters from a phenomenological 
power spectrum. Throughout we will assume that Wj^ 
is normalized so that (Cb) = Ci for Ci independent of I. 

With the assumptions of independence and Gaussian- 
ity for the uncertainty in Cb and the specification of the 
linear relationship between Ci and (Cb), it follows that 
the likelihood of the parameters is maximized by mini- 
mizing the following x^: 
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This x^ represents the confrontation between theory and 
data that occurs at the meeting point of the band-power. 
Use of this equation, or ones similar to it, is very efficient 
and is what has been used in analyses of the constraints 
placed on parameters due to available CMB data. Note 
that WP projects the theory into the "plane" of the ex- 
periment — or rather the same plane into which the ex- 
perimental data have also been reduced. 

Previous work has focused on generalizing Eq. y to take 
into account the non-Gaussianity and dependence of the 
uncertainties in Cb 0,0]. Here we focus on the choice 
of W^ . It is often assumed to be equal to the variance 
window function, W^ , which is actually the diagonal ele- 
ment of a window function matrix which specifles the re- 
lationship between the angular power spectrum and the 
covariance matrix of the signal, S: 
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where Sp is the signal contribution to the pth element 
of a dataset and the brackets indicate ensemble average. 



For a single demodulation, all the diagonal elements of 
the window function matrix are equal and that is why we 
can speak of the diagonal element. 

While using the variance window function to calcu- 
late the signal covariance matrix is correct (this is what 
the variance window function is defined to do, see, e.g., 
P-0|), using it in Eq. is not (except in the special case 
specified below). 

It is perhaps worth emphasizing the prevalent use of 
the variance window functions in equations like Eq. 2. 
All of the references in ref. [1,6] use it, as do all published 
analyses of large numbers of band powers. This use of 
W^ is due to the fact that a large number of reports 
of band-power constraints do not include W^ , but only 

wr (e.g. iii). 



III. THE BAND-POWER WINDOW FUNCTION 

A minimum- variance, unbiased, estimate of the power 
spectrum is given by Mm 



Ci = If-^Ti 



where 
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(5) 



is called the Fisher matrix. 

If we are only interested in estimating the amplitude 
of a power spectrum that we assume to be flat, (C; = 
Cb = constant) we can rewrite the minimum-variance 
estimator for Ci (Eq. |j) as 
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where Fbb = X^zc Fw (because OS/OCb — J2i dS/dCi). 
Equation pi can be viewed as the directions that take 
one from the data. A, to the meeting point of the band- 
power. 

We now must provide the directions to go from a theory 
to the band-power. With the usual assumptions of Gaus- 
sianity and statistical isotropy, theories are completely 
specified by their angular power spectrum, C/. We are 
therefore after the expectation value of the Cb of Eq. ||, 
under the assumption that the true power spectrum is Ci . 
Calculation of the dependence of this expectation value 
on Ci will provide the directions we need. 

The expectation value is easily calculated after noting 
that ((AA - TV)) = 5 = ^^ c, (^) ; it is given by 
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which further simphfies to 
J2w '^iPii' 
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(8) 



which imphcitly defines the band-power window function: 
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(9) 



We have found our hnear relationship between the ex- 
pected value of Cb and the assumed power spectrum, Ci . 
We note that it has a form we might have guessed — an 
inverse- variance weighted sum of the C; . We can identify 
it as such because the Fisher matrix also serves as an 
approximation to the inverse of the covariance matrix of 
the uncertainty in the Ci estimates. 

For band-powers derived from an estimator, the deriva- 
tion of the band-power window function is quite straight- 
forward: one simply calculates the expectation value, 
given Ci, as done above. However, band-powers are of- 
ten determined instead by finding the maximum of the 
likelihood function, rather than by the quadratic esti- 
mator of Eq. 0. The maximum-likelihood estimate is a 
complicated, non-quadratic, function of the data and its 
expectation value is not easy to calculate. In fact, for 
a maximum-likelihood estimate the band-power window 
function is ill-defined because the relationship between 
the maximum-likelihood and Ci is non-linear. 

Nevertheless, the above expressions for the band-power 
window function are still useful for maximum-likelihood 
band-power estimates. This is due to the relationship 
between likelihood analysis and the quadratic estimator 
of Eq. ^ pointed out in |^: used iteratively, Eq. ^ re- 
sults in the band-power that maximizes the likeliihood. 
If the Ci assumed for the right-hand side of Eq. |4| is at 
least roughly consistent with the data (and "smooth" 
||l|,p| Jli^Jl4|l ) then a single iteration of Eq. H will produce 
a very good approximation to the maximum-likelihood. 
Thus, the band-power window function is appropriate for 
Ci sufficiently close to the most likely power spectrum. 
Further away, non-linear corrections will become impor- 
tant. One could, in principle, calculate these non-linear 
corrections, but the improved precision is probably not 
worth the additional complication. 

We emphasize that Eq. truly does specify a linear 
relationship between the quadratically estimated, (Cb), 
and Ci . One might suspect that there are other dependen- 
cies on Ci hidden in the window function itself. However, 
the Fisher matrix that appears twice in Eq. M is that for 
a flat power spectrum with amplitude Cb, and does not 
depend on Ci. 



IV. THREE EXAMPLES 

Consider an experiment that maps the whole sky with 
a Gaussian beam with full width at half max = \/8 In 2(Ji, 
and a uniform noise level specified by a weight-per-solid 
angle, w. In this case of uniform noise and full-sky cover- 
age, the Fisher matrix can be calculated analytically and 
is given by [^ 
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where B{1) = e ' '^b/2 p^g to the Sw , the sum over I' is 
trivial and the band-power window function is 



Wi^^/l ex 



2^ + 1 



Ci + 



2ttwB^{1) 



(11) 



Thus, for Ci constant, we see that the band-power window 
function for this map is proportional to P at low I and 
then eventually drops very rapidly at higher / where it is 
proportional to B^{l)/P . 

This behavior of WP is intuitively reasonable. Cosmic 
variance is the reason that the very low ^s are less impor- 
tant to the overall determination of the band-power, and 
instrument noise suppresses the importance of the very 
high Is. 

Contrast this behavior to that of the variance window 
function. For a map, W^ is simply given by the square 
of the spherical harmonic transform of the beam, B{1). 
Therefore 



W^ ex B'^{1). 



(12) 



Note that this implies that the most important moments 
are the ones at lowest l\ Further, there is no dependence 
on the noise level. For the band-power window function 
we see that as the noise is lowered {w raised) , the impor- 
tance of the higher I moments increases. 

Our second example is for a dataset with n points that 
have no signal or noise correlations. We leave it as an 
exercise for the reader to show that in this case, WP = 



W> 



That is, in the absence of correlations, the two 



window functions are equivalent. 

Our final example is from the Medium Scale 
Anisotropy Measurement (MS AM) 3-year dataset Q. 
This dataset was reduced to measurements of the sky 
with two different beam maps, called single-difference 
and double-difference. The high signal-to-noise and 
dense sampling of the dataset mean that it is sensitive 
to Ci at somewhat higher values of I then one would in- 
fer from the variance window function. See Fig. ^. 
For the single-difference measurements, the results are 
especially striking: the peak is shifted from I = 120 to 
I — 160 and at I = 400, where there is a second local 
peak, WP is about 5 times larger than W^ ■ The band- 
power window functions were calculated assuming a flat 



power spectrum with amplitude consistent with the data 
ofCB = 2000(/iK)2. 
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FIG. 1. The band-power window functions (solid lines) and 
variance window functions (dashed lines) for the single and 
double difference MS AM beam maps (/; = Wijl). The nor- 
malization here is arbitrary. 



V. MULTIPLE BANDS 

In the above we have assumed a flat power spectrum. 
This is overly restrictive given that we wish to determine 
the presence of features in this power spectrum! However, 
the above easily generalizes to the case where the power 
spectrum is determined in nuiltiple flat bands, each of 
finite extent in /. 

Some datasets have sufficient dynamic range to esti- 
mate the power spectrum in more than one band. For 
these datasets we can parameterize the power spectrum, 
Ci , with the power in bands enumerated by the subscript 
B: 



^l -^}ZxB(ifB- 



(13) 



Where Cb is the amplitude of the power spectrum within 
band B and 
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1 : /<(B) < I < 1>{B) 
: otherwise 



(14) 



where 1<{B) and 1>{B) delimit the range of band B. 
For each of these bands we can calculate a band-power 
window function via 



</; 



J2w Fii' 



(15) 



where the sums over V run only from 1<{B) to ly{B). 

We could remove the need for window functions by 
making the bands very narrow since sufficiently narrow 
bands ensure that the sensitivity to each Ci within the 



band is approximately independent of I. However, mak- 
ing the bands too narrow makes the error bars very large 
and highly correlated. This is undesirable for two rea- 
sons. First, it hinders visual interpretation of the results 
and second, the larger the error bars, the more important 
are the non-Gaussian aspects of the distribution. And al- 
though the ansatzes of M for this non-Gaussian distribu- 
tion have been shown to work quite well in some cases, 
it is not clear how well they work in all cases. There- 
fore, broad bands may be desirable and the sensitivity to 
Ci may vary appreciably across the band. In such cases 
window functions, Wj^, tell us the in-band sensitivity. 

For experiments with small sky coverage, calculation 
of the elements of the Fisher matrix at every £ is not 
necessary. If the largest extent of the field is A9 then 
P;(cos6') and Pi+si{cos9) are close to indistinguishable if 
SI < IT / /S.9. Thus one can choose a fine binning, enumer- 
ated by 6, within each coarse band, B and assume that 
Fill is constant for all I and V within band h. 

It some times may not be practical to calculate the 
Fisher matrix for individual multipole moments. It may 
be easier to parameterize the power spectrum in terms 
of these fine bins, 



b 

and then calculate Ftf where 
Fbb' - Tr 
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(17) 



Then one assumes that Fu/ = Fw /{5l{h)5l{b')), where 
5l{h) is the width in I of fine band, h. We divide by the 
widths because W- ^ W-/5l{b). 
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VI. OTHER SHAPES 

It may be the case that an experiment reports a sin- 
gle measure of the power, but does so assuming a non- 
flat power spectrum shape. An historical example is the 
Gaussian auto-correlation function. Another possibility 
is that of an experiment measuring near the damping tail 
of the power spectrum, where assuming a flat shape may 
be a very bad approximation. Therefore we ask, if the 
amplitude of a non-flat power spectrum is estimated from 
the data, how can we calculate the theoretical predictions 
for this quantity? 

To frame the question more precisely, we assume that 
Q is calculated from a data set via 
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(18) 



where the power spectrum is assumed to be of the form 
QCi ^^°. Using similar manipulations as before, we find 
that the expectation value of Q, under the assumption 
that the true power spectrum is C;, to be 



(Q> = 
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w;^/i ci. (19) 



Thus we have the prescription for comparing the esti- 
mated amplitude to the predicted amplitude. 

Note that there is no clearly preferable means of con- 
verting the estimate of the amplitude, Q, into a measure 
of average power. One possible prescription is: 



Co = 



qE(</0^''"''/E(<a 



(20) 



Such a conversion is only useful for plotting purposes. 
The ambiguity in the choice of normalization does not 
disturb our ability to confront data with theory. As long 
as we know W; , and the shape assumed, C^ ^^°, we can 
make the theoretical prediction for Q. 



VII. DATA REPORTING RECIPE 

To summarize, our prescription for reporting power 
spectrum constraints is as follows: 

1) Parameterize the power spectrum via Eq. |l^ for some 
choice of bands. 

2) Find the Cb that maximize the likelihood function. 

3) Calculate the curvature matrix for these bands, J-bb', 
and also the log-normal offsets xb defined in ||] . 

4) Calculate the band-power window functions, Wi^ , 
from Fii' (which can be calculated cither via Eq. ^ or 
Eq. 0). 

Steps 1 through 3 have been spelled out in more detail 
in |n|. We have no general prescription for the best pa- 
rameterization of the power spectrum to use for a given 
dataset {e.g.^ how many bands to use and whether or not 
to assume a fiat shape). We expect that assuming a fiat 
spectrum across each band will be a reasonable choice in 
most situations. Whatever parameterization is chosen, 
the analyst should ensure that in addition to the estimate 
and its uncertainties, he or she also provide the means 
with which to convert a theoretical power spectrum into 
a prediction for that estimate. 



VIII. DISCUSSION 

As emphasized in |l|| , approximate methods for simul- 
taneous analysis of all relevant CMB data are a practical 
necessity. The use of band-power window functions, in- 
stead of variance window functions will improve the valid- 
ity of the commonly used method of radical compression 
to band-powers. 



The persistence of the use of variance window func- 
tions as opposed to band-power window functions (with- 
out even acknowledgment that this is, at best, an ap- 
proximation) is possibly attributable to the fact that 
maximum-likelihood estimates have a very complicated 
dependence on the data and, in fact, do not even have 
strictly well-defined band-power window functions. This 
conjecture is supported by the fact that analyses using 
quadratic estimators have not suffered from this confu- 
sion, while almost all of those using likelihood analysis 
have. It should also be noted that in analyses of galaxy 
redshift surveys, where quadratic estimators are gener- 
ally used to estimate the matter power spectrum, -P(fc), 
the correct form of the window function is generally used, 

e-9- HI]. 

As signal-to-noise rises, it becomes increasingly impor- 
tant to use the correct window function. This is because 
the power spectrum estimate becomes sensitive to more 
pairs of data points than just the diagonal ones — even 
the off-diagonal ones with very small signal matrix ele- 
ments. One can see from Eq. (7) that in the limit that 
TV ^ 0, all pairs (normalized to their expected signal) get 
weighted equally. In this limit the much more numerous 
off-diagonal pairs are extremely important to the deter- 
mination of the band-power. As a rough guide, one can 
compare the size of the largest off-diagonal terms to the 
noise level to determine how well the variance window 
function will approximate the band-power window func- 
tion. Also note that the approximation is generally worse 
for map datasets than difference datasets due to the fact 
that differencing reduces the off-diagonal correlations. 

Steps toward the proper definition of the band-power 
window function were taken in pl| where the diagonal 
elements of the window function matrix in the s/n basis 
were used as a means of determining the sensitivity of 
an experiment to the power spectrum. Working in the 
signal-to-noise eigenmode basis [p|Jlq| reduces the corre- 
lations, and we have seen that in the absence of corre- 
lations, the band-power window function is the variance 
window function. A similar procedure was used to cal- 
culate the window functions for the band-powers deter- 
mined from the QMAP maps ||l8| . 

As individual datasets become more powerful in their 
ability to determine Cj, they will report constraints in 
very narrow bands, decreasing the need for window func- 
tions which describe the in-band sensitivity. However, 
a useful role for window functions may remain for quite 
some time at both the low / and high I extremes of a 
datasets' power spectrum sensitivity. At these extremes 
one will need broad bands in order to have small error 
bars, and thus one will wish to know the shape of the 
in-band sensitivity. 
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